Table of Contents

A/B-testing

Data upload

There are gaps in the details column, which is normal, because this field contains additional information about the event.

Assessment of the correctness of the test

Checking the points of the terms of reference

Timing

The technical assignment says that the data contains all the events of new users in the period from December 7, 2020 to January 4, 2021, however, we see that the data is limited to December 30, the test was stopped ahead of time.

The technical assignment says that the file contains users who registered in the online store in the period from December 7 to December 21, 2020, but we see that there are also users with a registration date after December 21. The test was stopped on December 30, respectively, not all users managed to "live" all 14 days from the registration date. We will remove users who came after December 16:

User regions

According to the terms of the TOR, 15% of new users from the EU should participate in the test, it was possible to score 13.73%. Let's check how statistically significant this difference is for our test.

H0 - the shares of users in the sample and the general population from the EU are equal. H1 - there is a difference between the shares.

There is no reason to consider the shares are different.

Number of test participants

The number of test participants is more than expected (6000), the groups are not divided 50/50 by the number of participants. Since we are studying conversion, a relative metric, the groups may be unbalanced.

Time of the test

There is an intersection of the time of the test with two marketing activities related to the New Year holidays. Nevertheless, since the start date of the campaign is close to the end of the test and at the end of December, user behavior is in principle difficult to call standard due to the increase in activity before the holidays, we will leave the events from these ranges. The test was aimed at users from the EU, so the CIS New Year Gift Lottery promotion should not affect it.

Audience

Let's make sure that there are no intersections with a competing test and there are no users participating in two test groups at the same time. We will also check the uniformity of the distribution of users into test groups and the correctness of their formation.

The two tests are competing and both affect the funnel. If we delete the users who got into both tests, we will reduce the number of users by 24.36%, which is too much data loss.

Distribution of users by registration date between groups

The overall pattern is the same.

Distribution of participants by region between groups

The test was aimed at an audience from the EU, so it is expected that the majority of users are from there. Other regions, apparently, were hit due to an error in the mechanism for recruiting users into groups. Let's filter out the test participants so that only users from other regions remain.

Distribution of participants by device between groups

The distribution is uniform across the devices.

Distribution of events between groups by days of the week

Most of the events occur on Monday.

Exploratory data analysis

Number of events per user in A/B samples

The distributions are similar. Let's use the t-test to check the equality of the average number of events per user in two groups.

The test showed that there is a difference between the two aggregates. The average number of events per user in group B is 17% more than in group A. Perhaps the changes were positive and in group B users go further down the sales funnel.

Distribution of the number of events in the samples by day

In Group B, users were more active at the beginning of the test. In both groups, activity dropped after the peak on December 21. In Group A, we see a sharp increase on the 14th. It is possible that a failure occurred on this day, which affects the results of the analysis.

Funnels by groups

Group B shows a worse conversion rate at the second stage of the funnel. The conversion rate at the last stage of the funnel is almost the same for both groups. There are more "purchase" events for both group A and group B than basket views - perhaps some buyers make a "quick purchase in 1 click", bypassing the basket view.

Conclusion on EDA and evaluation of the correctness of the test

  1. The test was stopped earlier than expected.
  2. The time of the test coincided with two New Year's promotions, as well as with the time of the competing test.
  3. The test participants included users from regions other than the EU, but their number is insignificant.
  4. There are users who are in two groups at the same time.
  5. There are differences in the conversions of the two groups - we will check the statistical significance of the differences at the next stage.

Checking the statistical difference of the shares

H0 - there are no differences between the shares, H1 - there are differences between the shares.

To test hypotheses, we will use the z-criterion.

General conclusion

  1. According to the design of the test, there are a number of errors related to the timing of the test and the set of users - it is recommended to correct them during further tests.

  2. After the test, group B did not show significant results, at the login->product_page stage, the conversion generally worsened.